Logical Functions

Motivation

This chapter covers the implementation of simple logical functions in C++ and R. The goal is to show the syntax differences between the two languages and compare their performance. These examples were adapted from @vaughan.

Fair Warning

These functions ignore NA values. Adjustments for handling NA values will be introduced in the sixth chapter.

R already provides efficient versions of the functions covered here. Code optimizations and improvements will be made in later chapters.

Load the Package

I loaded the ece244 package as I added the functions from the next sections to it with the following code:

load_all()

Additional Packages

I used the bench package to compare the performance of the functions. The package was loaded with the following code:

library(bench)

Are Some Values True? (any())

The any() function returns TRUE if there is at least one TRUE element in a vector, and FALSE otherwise. Below is one possible C++ implementation:

[[cpp11::register]] bool any_cpp_(logicals x) {
  int n = x.size();
  
  for (int i = 0; i < n; ++i) {
    if (x[i]) {
      return true;
    }
  }
  return false;
}

Its R equivalent is:

#' Return TRUE if any element in a vector is TRUE (R)
#' @param x logical vector
#' @export
any_r <- function(x) {
  n <- length(x)
  
  for (i in 1:n) {
    if (x[i]) {
      return(TRUE)
    }
  }
  FALSE
}

To document the C++ function, I added the following wrapper to the R code:

#' Return TRUE if any element in a vector is TRUE (C++)
#' @inheritParams any_r
#' @export
any_cpp <- function(x) {
  any_cpp_(x)
}

To test the functions, I ran the following benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements
y <- ifelse(x > 2, TRUE, FALSE)

any(y)
[1] TRUE
any_cpp(y)
[1] TRUE
any_r(y)
[1] TRUE
mark(
  any(y),
  any_cpp(y),
  any_r(y)
)
# A tibble: 3 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 any(y)        120ns    130ns  6188530.        0B      0  
2 any_cpp(y)    862ns    941ns   896801.        0B     89.7
3 any_r(y)      442ns    527ns  1624734.    19.5KB      0  

Which Indices are TRUE? (which())

The which() function returns the indices of the TRUE elements in a vector. Here is a possible C++ implementation:

[[cpp11::register]] integers which_cpp_(logicals x) {
  int n = x.size();
  writable::integers res;
  int j = 0;

  for (int i = 0; i < n; ++i) {
    if (x[i]) {
      ++j;
      res.push_back(i + 1);
    }
  }

  if (j == 0) {
    return integers(0);
  } else {
    return res;
  }
}

Its R equivalent is:

#' Return the indexes of the TRUE elements in a vector (R)
#' @param x vector of values
#' @export
which_r <- function(x) {
  n <- length(x)
  res <- c()
  j <- 0

  for (i in 1:n) {
    if (x[i]) {
      res <- c(res, i)
      j <- j + 1
    }
  }

  if (j == 0) {
    return(0)
  } else {
    return(res)
  }
}

To document the C++ function, I added the following wrapper to the R code:

#' Return the index of the TRUE elements in a vector (C++)
#' @inheritParams which_r
#' @export
which_cpp <- function(x) {
  which_cpp_(x)
}

To test the functions, I ran the following benchmark code in the R console:

which(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
which_cpp(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
which_r(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
mark(
  which(y[1:1000]),
  which_cpp(y[1:1000]),
  which_r(y[1:1000])
)
# A tibble: 3 × 6
  expression                min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 which(y[1:1000])       3.73µs   4.29µs   200654.    13.2KB     60.2
2 which_cpp(y[1:1000])  14.46µs  16.25µs    59158.    13.1KB     17.8
3 which_r(y[1:1000])   111.56µs 127.42µs     7455.   250.9KB     36.4

Are All Values True? (all())

The all() function checks if all elements in a vector are TRUE. Here is a possible C++ implementation that loops over the vector:

[[cpp11::register]] bool all_cpp_1_(logicals x) {
  int n = x.size();
  for (int i = 0; i < n; ++i) {
    if (!x[i]) {
      return false;
    }
  }
  return true;
}

More concise C++ alternatives are:

[[cpp11::register]] bool all_cpp_2_(logicals x) {
  for (int i = 0; i < x.size(); ++i) {
    if (!x[i]) {
      return false;
    }
  }
  return true;
}

[[cpp11::register]] bool all_cpp_3_(logicals x) {
  for (bool i : x) {
    if (!i) {
      return false;
    }
  }
  return true;
}

[[cpp11::register]] bool all_cpp_4_(logicals x) {
  return std::all_of(x.begin(), x.end(), [](bool x) { return x; });
}

To avoid typing std:: every time, you can use using namespace std; at the top of src/code.cpp. However, this is not recommended because it can lead to conflicts. A better option is to declare using std::the_function; which means you can use the_function instead of std::the_function each time [@akbiggs].

To test the functions, I ran the following tests and benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements

all(x > 2)
[1] FALSE
all_cpp_1_(x > 2)
[1] FALSE
all_cpp_2_(x > 2)
[1] FALSE
all_cpp_3_(x > 2)
[1] FALSE
all_cpp_4_(x > 2)
[1] FALSE
# also test the TRUE-only case
all(x >= 0)
[1] TRUE
all_cpp_1_(x >= 0)
[1] TRUE
all_cpp_2_(x >= 0)
[1] TRUE
all_cpp_3_(x >= 0)
[1] TRUE
all_cpp_4_(x >= 0)
[1] TRUE
mark(
  all(x > 2),
  all_cpp_1_(x > 2),
  all_cpp_2_(x > 2),
  all_cpp_3_(x > 2),
  all_cpp_4_(x > 2)
)
# A tibble: 5 × 6
  expression             min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 all(x > 2)          2.25ms   2.34ms      426.    3.81MB     61.6
2 all_cpp_1_(x > 2)    2.2ms   2.38ms      418.    3.81MB     69.7
3 all_cpp_2_(x > 2)   2.26ms   2.37ms      420.    3.81MB     61.1
4 all_cpp_3_(x > 2)   2.26ms   2.37ms      420.    3.81MB     61.7
5 all_cpp_4_(x > 2)   2.26ms   2.37ms      423.    3.81MB     67.8

References